Jargon is an innovative Chrome extension (Chrome Web Store, Official Website) created by my friend that transforms English web content into learning opportunities using generative AI technology. Launched in June 2024, Jargon offers two types of learning experiences: foreign language learning (Spanish, Chinese, etc.) and English style adaptation (GRE vocabulary, TikTok slang, etc.).
Figure 1: User Settings Interface showing customization options
Key Features
Language Selection
All types, from foreign languages like Spanish and Chinese to English variations such as TikTok Slang
Learning Goals
• Difficulty: Easy-Hard (1-10)
• Daily Target: 10-100
questions
Question Density
Controls percentage of eligible sentences (0-100%) highlighted for practice on each webpage
Display Settings
• Text Style: Highlight or underline
• Site Controls:
Enable/disable per website or temporarily
Figure 2a: Highlight Style - Text appears with background color emphasis
Figure 2b: Underline Style - Text appears with underline emphasis
Figure 3: Question Generation Process - Users select text from any webpage to create practice questions
The GRE mode enhances vocabulary learning by replacing common words with their more sophisticated alternatives (e.g., “good” becomes “exemplary”), while TikTok style transforms formal English into contemporary social media expressions (e.g., “That’s cool” becomes “That’s bussin fr fr”). These AI-powered transformations maintain the original meaning while adapting to different language registers.
After 10 months of operation and 93 users, this analysis investigates three key aspects of user behavior:
The data for this analysis was collected from Jargon’s Supabase database, covering user interactions from the extension’s launch in June 2024 through March 16, 2025. The dataset comprises five main tables:
| Dataset | Records | Description |
|---|---|---|
| Profiles | 92 | User profiles and settings |
| Questions | 2442 | Generated practice questions |
| Words | 1594 | Vocabulary entries and translations |
| Levels | 117 | User progression through difficulty levels |
| Websites | 27 | Websites where extension was disabled |
| Variable | Type | Description | Notes |
|---|---|---|---|
| user_id | Primary Key | Unique identifier for each user | Anonymized identifier |
| level | Integer | Current proficiency level | Range: 1-10 |
| paused | Boolean | Extension status on Chrome | TRUE/FALSE (Default: TRUE) |
| chrome_notifs | Boolean | Notification preferences | TRUE/FALSE |
| language | String | Current selected language mode | e.g., ‘GRE Vocabulary’, ‘TikTok Slang’ |
| last_question_time | DateTime | Timestamp of most recent question | UTC timezone |
| week_streak | Integer | Consecutive weeks of activity | |
| daily_streak | Integer | Consecutive days of activity | |
| daily_progress | Integer | Questions completed today | Resets daily |
| daily_goal | Integer | Target questions per day | User-set goal |
| density | Integer | Frequency of questions | Percentage of eligible sentences shown (0-100) |
| highlightStyle | String | Text selection preference | ‘highlight’ or ‘underline’ |
| Variable | Type | Description | Notes |
|---|---|---|---|
| question_id | Primary Key | Unique question identifier | |
| user_id | Foreign Key | Associated user | References profiles |
| created_at | DateTime | Question generation time | UTC timezone |
| sentence | Text | Original selected text | English source content |
| word | String | Target word for learning | |
| language | String | Transformation mode | Selected language mode |
| original_sentence | Text | Source text | Pre-transformation content |
| options_array | Array of String | Multiple choice options | Even indices: options in target language; Odd indices: English translations |
| answered_at | DateTime | Completion timestamp | NULL if unanswered |
| chosen_option | String | User’s answer | NULL if unanswered |
| user_rating | Integer | Question quality rating | Feature not yet implemented |
| Variable | Type | Description | Notes |
|---|---|---|---|
| created_at | DateTime | Word entry timestamp | UTC timezone |
| word | String | Target vocabulary | |
| language | String | Language mode | |
| user_id | Foreign Key | Associated user | References profiles |
| translation | Text | English translation | AI-generated translation |
| status | String | Learning status | Currently all set to ‘learning’ |
| Variable | Type | Description | Notes |
|---|---|---|---|
| user_id | Foreign Key | Associated user | References profiles |
| language | String | Language mode | |
| level | Integer | Difficulty level | Range: 1-10 |
| Variable | Type | Description | Notes |
|---|---|---|---|
| user_id | Foreign Key | Associated user | References profiles |
| website | String | Blocked URL | Sites where Jargon is disabled |
Profile Enhancement
| Variable | Calculation | Purpose |
|---|---|---|
| generated_questions | Count of questions per user | Measure overall engagement |
| answered_questions | Count of questions with answers | Measure learning completion |
| blocked_sites | Count of blocked websites | Understand avoidance patterns |
| levels_attempted | Count of unique combination of languages and difficulty levels | Track learning progression |
Our exploratory data analysis examines patterns that inform both research questions about usage context and feature adoption. We organize our exploration into four main categories:
Figure 5: Website Usage Analysis - Distribution of blocked websites by category (left) and frequency of individual websites (right)
The analysis of blocked websites reveals distinct patterns in how users interact with the Jargon extension. Professional tools—particularly Salesforce and AI platforms—are the most frequently blocked, suggesting that users tend to avoid using Jargon during work-related activities. The presence of development environment blocks indicates that some users are technical professionals, though this group represents only a modest portion of the overall user base. Educational content also features prominently among blocked websites, with users often disabling the extension on documentation sites and learning platforms, possibly to maintain focus during concentrated study sessions.
However, it is important to note that there are only 27 blocked sites across 92 users. This limited usage suggests that the blocking feature is not widely utilized, and the current data may not be conclusive. Caution should be exercised when generalizing these findings, as they may not fully represent the broader user population.
Figure 6: Scatter plot showing the relationship between user adoption and question generation across different language modes
The scatter plot highlights key patterns in language mode usage:
Overall, while usage intensity and adoption vary widely across languages, traditional language learning modes drive most activity.
Figure 7: Word frequency analysis showing common words (left) and word pairs (right) in learning content. Colors indicate frequency of occurrence, with darker shades representing higher frequencies.
Insights from Word and Phrase Frequency Analysis (based on the English original sentences selected for content generation):
Overall, the word frequency analysis reveals that users are engaging most with scientific and descriptive content, focusing on process-oriented vocabulary and recurring technical terms.
Figure 8: Daily activity patterns showing question generation and active users with their respective averages (dashed lines) over the observation period, based on UTC timezone. Questions average: 12.5 per day; Users average: 2.2 per day.
Figure 9: Weekly activity patterns showing average questions generated and active users by day of week (UTC timezone), with error bars indicating standard error.
The temporal analysis reveals several key patterns in user engagement, based on both daily and weekly activity (all timestamps in UTC):
Daily Trends:: Question generation and active user counts fluctuate considerably day-to-day, with occasional spikes (up to 200 questions or 12 users), but most days remain below the average (12.5 questions, 2.2 users).This indicates a small but steady user base, with 1–5 active users on most days.
Weekly Trends: Question generation is highest on Mondays, Tuesdays, and Wednesdays, then tapers off toward the weekend,suggesting users are more engaged during the workweek. There is substantial variability across days, as shown by the error bars.
Together, these patterns indicate that Jargon’s usage is characterized by low but regular engagement, with activity peaking midweek and significant day-to-day variability. This suggests a core group of users who interact with the platform most during the workweek.
Figure 10: Distribution of key engagement metrics across users, showing individual violin plots for each metric with median and interquartile range (IQR) statistics. Each plot uses a distinct color and includes summary statistics.
The violin plots provide a clearer view of the distribution of user engagement metrics:
Overall, the violin plots highlight that engagement is highly skewed: most users interact minimally, while a small subset are much more active or exploratory. This pattern is consistent across all four metrics.
To further address our first research question—“What are the common
contexts and platforms where users engage with Jargon?”—we performed
sentiment analysis on the English original sentences that users selected
for learning. Using the syuzhet package in R, each sentence
was assigned a sentiment score, where positive values indicate positive
sentiment, negative values indicate negative sentiment, and values near
zero indicate neutral sentiment. This approach allows us to
quantitatively assess the emotional tone of the content users choose to
engage with.
Figure 11: Stacked bar graph showing the frequency of user-selected sentences in each sentiment category, stacked by language mode (top 5 languages shown in color, all others in grey).
The stacked bar graph shows the overall distribution of sentiment categories, with the top 5 language modes highlighted in color and all other languages grouped in grey. This visualization highlights both the predominance of neutral and slightly negative content and the relative engagement of different language modes—including less common ones—across sentiment categories.
To further explore the contexts in which users engage with Jargon, we applied Latent Dirichlet Allocation (LDA) topic modeling to the English original sentences selected by users. In addition to standard stopwords, we removed a custom list of common or uninformative words to improve topic quality. This method uncovers the main themes or topics present in the content users choose to learn from.
Figure 12: Top terms for each topic identified by LDA topic modeling of user-selected sentences. Each panel shows the most important words for one topic, with x-axis numbering visible for all.
The LDA topic modeling did not yield strong or actionable insights about the contexts or platforms where users engage with Jargon. The “Importance (beta)” values are all quite low (well below 0.05), which is typical for LDA on short texts or small datasets, but it also means that no single word dominates any topic. The topics identified are diffuse, with mostly generic or process-oriented terms. This suggests that either the user-selected content is too varied or generic for topic modeling to be effective, or that the dataset is not large or rich enough for LDA to find meaningful structure. This is a valid finding: not all analyses reveal clear patterns, and reporting this transparently demonstrates scientific rigor. It may also indicate that user engagement with Jargon is broad and not easily categorized, or that more data is needed for deeper insights.
Despite the weak themes, a tentative interpretation of the topics is as follows:
However, these interpretations are tentative due to the low importance values and the generic nature of the terms.